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Abstract 

In this paper, a novel dual estimation methodology is developed for both time- varying parameters 
and states of a nonlinear stochastic system based on the Recursive Prediction Error (RPE) concept and 
the Particle Filtering (PF) scheme. Our developed methodology in based on a concurrent implementation 
of state and parameter estimation filters as opposed to using a single filter for simultaneously estimating 
the augmented states and parameters. The convergence and stability of our proposed dual estimation 
strategy are shown formally to be guaranteed under certain conditions. The proposed dual estimation 
framework is then utilized for addressing the challenging problem of fault diagnosis of nonlinear 
systems. The performance capabilities of our proposed fault diagnosis methodology is demonstrated 
and evaluated by its application to a gas turbine engine through accomplishing state and parameter 
estimation under simultaneous and concurrent component fault scenarios. The health parameters of the 
system are considered to be slowly time- varying during the engine operation. Extensive simulation results 
are provided to substantiate and justify the superiority of our proposed fault diagnosis methodology when 
compared with another well-known alternative diagnostic technique that is available in the literature. 



I. INTRODUCTION 

Systems state estimation is a fundamental problem in control, signal processing, and fault 
diagnosis fields [1]. Investigations on both linear and nonlinear state estimation and filtering in 
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stochastic environments have been an active area of research during the past several decades. 
Linear state estimation methods use a simpler representation of an actual nonlinear system and 
can provide an acceptable performance only locally around an operating point and in the steady 
state operational condition of the system. However, as nonlinearities in the system dynamics 
become dominant, the performance of linear approaches deteriorates and linear algorithms will 
not necessarily converge to an accurate solution. Although an optimal state estimation solution for 
linear filtering methods exists, nonlinear filtering methods suffer from generating sub-optimal or 
near-optimal solutions. Consequently, investigation of nonlinear estimation and filtering problems 
remain a challenging research. 

Numerous studies have been conducted in the literature to solve and analyze standard nonlinear 
filtering problems [2]-[8]. These methods can be broadly categorized into [7]: (a) lineariza- 
tion methods (extended Kalman filters (EKF)) [2], (b) approximation methods using finite- 
dimensional nonlinear filters [3], (c) particle filter (PF) methods as one of the most popular 
Bayesian recursive methods for state estimation [4], (d) classical partial differential equation 
(PDE) methods for approximating a solution to the Zakai equation [5], (e) Wiener chaos ex- 
pansion methods [6], (f) moment methods [7], and (g) high dimensional nonlinear Kalman filter 
methods known as the Cubature Kalman filters [8]. 

One of the most important recent applications of nonlinear filtering methods is in the area of 
fault diagnosis of dynamical systems that can include fault detection, isolation, and identification 
(FDII) modules. Diagnosis methods that are based on linearization techniques suffer from poor 
detection and high rates of false alarms. Therefore, Monte Carlo filtering approach based on 
particle filters was first proposed in [9] to address the fault detection and isolation problem in 
nonlinear systems. In this work, the negative log-likelihood, which is calculated for a predefined 
time window, is considered as a measure for the fault detection. The fault isolation was achieved 
by using the augmentation of the fault parameters vector to the system states to perform the 
estimation task. However, the augmented state space model tends to increase the dimensionality 
of the model and as a result increases the number of required particles for achieving a sufficiently 
accurate result. For decreasing the computational burden of this method, the augmented model 
is used only after the fault detection stage and for only the fault isolation stage. An external 
covariance adjustment loop was added to this augmented model in [10] to enable the estimation 
algorithm to track changes in the system parameters in case of fault occurrences. The combination 



June 21, 2015 



DRAFT 



3 



of a particle filtering algorithm and the log-likelihood ratio (LLR) test in the multiple model 
environment, has led to the development of sensor/actuator FDI scheme in [1 1] for a general class 
of nonlinear non-Gaussian dynamical systems but with the assumption of full state measurements. 
The fault detection problem recently is addressed for a mobile robot based on the combination 
of the negative LLR test and particle filtering approach in [12]. However, both of the methods 
in [11], and [12] suffer from the high computational burden for on-line implementation of the 
algorithms. Hence, the idea of parallelized particle filters for on-line fault diagnosis is introduced 
in [12] to improve the performance of the algorithm. 

A PF-based robust navigation approach was proposed in [13] to address multiple and si- 
multaneous faults occurrences in both actuators and sensors in an underwater robot where an 
anomaly is modeled by a switching-mode hidden Markov system. The component and actuator 
fault detection and isolation of a point mass satellite was tackled in [14] by introducing several 
particle filters that run in parallel and each rejects a different subset of the faults. 

Generally, the main issues with applying standard particle filters to the problem of fault 
diagnosis problem can be stated as follows [15]: (i) False diagnosis decisions due to low 
probabilities of transitions to fault states when there are fewer samples of states, and (ii) 
The exponential growth of the required samples for accurately approximating the a posteriori 
distributions as dimensionality of the estimation problem increases. The risk-sensitive PF is 
introduced to address the first problem and the variable resolution PF is developed to overcome 
the second problem in [16]. Moreover, the Gaussian PF (GPF) is also introduced in [17] as 
an efficient algorithm for performing fault diagnosis of hybrid systems faster than traditional 
methods that are based on PFs. Finally, the sample impoverishment problem in particle filters 
due to fault occurrence in a hybrid system is addressed in [18]. The developed algorithm enables 
the PF method to be implemented by fewer number of particles even under faulty conditions. 

In this work, our objective is to address the aforementioned problems by using particle filters. 
This is accomplished through developing a PF-based dual state/parameter estimation algorithm 
for the component fault diagnosis problem. The developed dual estimation scheme relies on an 
on-line estimation of the system states as well as the health parameters that are not necessarily 
fixed and their time variations are generated by the fault vector that affects them. 

The on-line estimation of the system time-varying parameters by using particle filters is a 
challenging and active area of research. There are two main classes of PF-based parameter 
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estimation algorithms (for on-line as well as off-line implementations) [19] known as Bayesian 
and maximum likelihood (ML) approaches. In the Bayesian approach, a prior distribution is 
considered for the unknown parameters and the posterior distribution of the parameters is approx- 
imated given the observations [20], [21], whereas in the ML approach the estimated parameter is 
the maximizing argument of the likelihood function given the observations [22]-[25]. In the ML 
framework for parameter estimation, the maximization of any cost function can be performed 
based on gradient-based search methods [22]. On the other hand, expectation maximization 
(EM) methods are only applicable for maximization of the likelihood functions [25]. However, 
EM methods are not suitable for on-line applications due to their high computational cost for 
implementation. The recursive maximum likelihood method (RML) is recognized as a promising 
method for on-line parameter estimation based on a stochastic gradient algorithm [23]. In order 
to avoid the direct computation of the likelihood function gradient, an alternative method is 
proposed in [26] that is known as the gradient-free ML parameter estimation. Despite the above, 
the on-line ML methods suffer from the practical point of view of slow convergence rates and 
requiring large number of particles to achieve accurate estimates [27]. 

In the Bayesian framework, on-line implementation of particle filter-based parameter esti- 
mation algorithms are computationally intensive [28]. A general method that is capable of 
simultaneously estimating the static (i.e. , constant or fixed) parameters and time-varying states 
of a system is developed in [29]. The work is based on the sequential Monte Carlo (SMC) 
method in which an artificial dynamic evolution model is considered for the unknown model 
parameters. In order to overcome the degeneracy concerns arising from the particle filtering, 
kernel smoothing technique as a method for smoothing the approximation of the parameters 
conditional density has been utilized in [20]. The estimation algorithm is further improved by 
re-interpretation of the artificial evolution algorithm according to the shrinkage scaling concept. 
However, the proposed method in [20] is only applicable for estimating fixed parameters of the 
system and it uses the augmented state/parameter vector for the estimation task. 

In our proposed dual state/parameter estimation method, we take advantage of the fast con- 
vergence rate of parameter estimation algorithms in Bayesian framework. The associated com- 
putational cost of our estimation algorithm by augmentation of the parameters with the states is 
addressed by designing two concurrent filters for states and parameters estimation. Consequently, 
there is no need to increase the number of particles despite the fact that the dimension of the 
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estimation problem increases by augmenting the parameter estimates to the filter. Our main goal 
is to extend the framework proposed in [20] for both state and parameter estimations by invoking 
our proposed modified artificial evolution law and the recursive prediction error (RPE) concept. 
This modification enables the algorithm to track and estimate the time-varying parameters of 
the system. The RPE is used for both off-line and on-line system identification by utilizing a 
quadratically convergent scheme [30], [31]. The concept of the kernel mean shrinkage is also 
used in our proposed scheme to avoid overdispersion in the variance of the estimated parameters. 

Component fault diagnosis of dynamical systems can be achieved through dual state/parameter 
estimation methods [32], [33] by first detecting the faulty competent and then by determining its 
severity and isolating its location. In this work, our proposed dual state and parameter estimation 
strategy is applied for performing component fault diagnosis of a gas turbine engine [34]. This 
is accomplished by utilizing a multiple-model approach as developed in [34], [35]. Although 
the multiple-model approach can detect and isolate faults based on predefined faulty models, 
the utilization of parameter estimation schemes in our proposed dual state/parameter estimation 
enables the diagnostic scheme to not only detect and isolate but also accurately identify the type 
and severity of simultaneous fault scenarios in the gas turbine system. 

Based on the above discussion, the main contributions of this paper are to utilize nonlinear 
Bayesian and Sequential Monte Carlo (SMC) methods to develop, design, analyze, and implement 
a unified framework for both the state and parameter estimation as well as fault diagnosis 
problems of nonlinear systems. Our methodology is based on solving the Bayesian recursive 
relations through SMC methods. An on-line parameter estimation scheme is developed based on 
the recursive prediction error (RPE) method by using the particle filters (PF) approach. Specifi- 
cally, by using the prediction error to correct time-varying changes in the system parameters, a 
novel method is proposed for parameter estimation of nonlinear systems based on the PF. In the 
implementation of our proposed scheme, a dual structure for both state and parameter estimation 
is developed within the PF approach. In other words, the hidden states, and variations of the 
system parameters are estimated through two concurrent filters. Convergence and stability of our 
proposed dual estimation strategy are shown to be guaranteed formally under certain conditions. 

The remainder of this paper is organized as follows. In Section II, the statement of the 
nonlinear filtering problem is presented. Our proposed dual state/parameter estimation scheme 
is developed in Section III, in which state and parameter estimation methods are first developed 
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concurrently and subsequently integrated together for simultaneously estimating the system 
states and parameters. The stability and convergence properties of the proposed schemes under 
certain conditions are also provided in Section III. Our proposed fault diagnosis framework and 
formulation are also provided in Section III. In Section IV, extensive simulation results and 
case studies are provided to demonstrate and justify the merits of our proposed method for fault 
diagnosis of a gas turbine engine under simultaneous and concurrent component faults. Finally, 
the paper is concluded in Section V. 

II. PROBLEM STATEMENT 

The problem under consideration is to obtain an optimal estimate of states as well as time- 
varying parameters of a nonlinear system whose dynamics is governed by a discrete-time stochas- 
tic model, 

fft+i = ft(xuO u u t ), (1) 

y t = h t (x u 9 t ) + v u (2) 

where x t G R Ux is the system state, t G N, f t : R Ux x R n ° x R Uuj — > R Ux is a known nonlinear 
function, 9 t G R ne is an unknown and possibly time-varying parameter vector governed by 
an unknown dynamics. The function h t : R Ux x R ne — > R ny is a known nonlinear function 
representing the map between the states, parameters and the system measurements, and cj t and 
v t are uncorrected stochastic process and measurement noise sequences with covariance matrices 
L t and V t , respectively. The following assumption is made regarding the dynamical system (1) 
and (2). 

Assumption Al. The vector {x u 9 t } ranges over a compact set denoted by D^, for which 
the functions jt{x u 9 t) uj t ) and h t {x t) 9 t ) are continuously differentiable with respect to the state 
x t as well as the parameter 9 t 

The main objective of the dual state and parameter estimation problem is to approximate the 



following conditional expectations: 

E(0i(x t )|yi :t ,0 t _i) = J </>i(xt)p(xt\yi:uO t -i)dx u (3a) 

Hfo(9 t )\y 1:U x t ) = J h(0 t )p(9 t \y 1:U x t )d9 u (3b) 
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where y 1:t = (2/1? 2/2? - denotes the available observations up to time t, : R nx —> M and 
02 : M 720 —> M are functions of states and parameters, respectively, that are to be estimated. The 
conditional probability functions p(x t \yi :t ,6 t -i)dx t and p(0 t \yi :t ,x t )d6 t are to be approximated 
by the designed particle filters (PFs) through determining the filtering distributions according to 

N 

PN(xt\yi:t,O t -i)dx t = ^2w^ t 5 x (i)(dx t ), 

(4) 

PN(O t \y 1:t ,x t )d0 t = ^w^5 Q u){de t ), 
where the subscript TV in p N (.) implies that the state/parameter conditional probability distribu- 

(i) (i) 

tions are obtained from N particles. Each state particle x\ J has a weight Wx( and each parameter 
particle 9^ has a weight w^ t \ where 5(.) denotes the Dirac-delta function mass that is positioned 
at x t or 0 t . 

Based on the approximations used in equation (4), our goal is to address the convergence 
properties of the subsequently designed estimators to their true optimal estimates and also to 
develop and demonstrate under what conditions this convergence remains valid. 

III. Proposed Dual State/Parameter Estimation and Fault Diagnosis 

Framework 

In this section, the main theoretical framework for our proposed dual state/parameter filtering 
as well as the fault diagnosis methodology of the nonlinear system (1) and (2) are introduced 
and developed. 

A. Dynamic Model in Presence of Time-Varying Parameters 

Our first task is to represent the model (1) and (2) into another framework for our subsequent 
theoretical developments. Let (f^J^P) denote the probability space on which the three real 
vector-valued stochastic processes X = {X t ,t = 1,2,...}, 6 = {0t,£ = 1,2,...}, and Y = 
{Y t) t = 1,2,...} are defined. The n x -dimensional process X describes the evolution of the hidden 
states, the no -dimensional process 9 describes the evolution of the hidden system parameters 
that are conditionally independent of the states, and the n y -dimensional process Y denotes the 
observation process of the system. 
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The processes X and 6 are Markov processes with the associated initial state and parameter 
X 0 and B 0 , respectively. They are drawn from the initial distributions h Xq (&xq) and 7T0 o (d0o)> 
respectively. The dynamic evolution of states and parameters are modeled by the Markov tran- 
sition kernels K x (dx t \x t -i,6 t -i) and K e (d6 t \6 t -i,x t ), that also admit densities with respect to 
the Lebesgue measure \ such that 

P(X t eA 1 \Xt-i=xt- U e t - 1 = 0t-i)= I K x (xt\x t -uOt-i)dx t , (5) 

P(e t eA 2 \et-i=0t-uX t = xt)= [ K e (0t\e t -uxt)d0 u (6) 

Ja 2 

for all A x e B(R Ux ) and A 2 e B(R ne ), where B(R Ux ) and B(R n °) denote the Borel a-algebra 
on R Ux and R ne , respectively. The transition kernel K x (x t \x t -i, 6 t -\) is a probability distribution 
function (pdf) that follows the pdf of the stochastic process in process (1). The probability density 
function for approximating the parameter kernel transition K e (0 t \6 t -i,x t ) is to be provided in 
the subsequent subsections. 

Given the states and parameters, the observations Y t are conditionally independent and have 
the marginal distribution with a density with respect to the Lebesgue measure as given by, 

P(Y t £ B\X t = x u Q t = 6 t ) = J B p(y t \xtA)dyu (7) 

where p(y t \x t) 6 t ) is a probability density function that follows the probability density function 
of the stochastic process in equation (2). 

In the dual state/parameter estimation framework, at first the state x t is estimated (which is 
denoted by x t \ t ). The estimated value at time t is then used to estimate the parameter 9 t at 
time t (which is denoted by 9 t \ t ). In the Bayesian framework for parameter estimation, the prior 
evolution of parameters are not specified, therefore it is necessary to consider a given evolution 
for the parameters in order to design an estimation filter. In our proposed dual structure for 
the state estimation filter, first the parameters are assumed to be constant at time t — 1 at their 
estimated value O t _i\ t _i, and then for the parameter estimation filter they are evolved to the 
next time instant by applying an update law that is based on the recursive prediction error 
(RPE) method. The details regarding our proposed methodology are presented in the subsequent 
subsections in which the filtering of states and parameters are fully described and developed. 

! The transition kernel K(dx t \x t -i) admits density with respect to the Lebesgue measure if one can write P(X t G dx t \X t -i = 
x t -i) = K(dx t \x t -i) = K(x t \x t -i)dx t . 
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Fig. 1. The schematic of the dual particle filter. 



B. The Dual State/Parameter Estimation Framework 

In our proposed dual state/parameter estimation framework, two filters are runing concurrently. 
At every time step, the first PF-based state filter estimates the states by using the current available 
estimate of the parameters, O t -i\t-i, whereas the second PF-based parameter filter estimates the 
unknown parameters by using the current estimated states, x t \ t . The developed schematic is 
shown in Figure 1. 

In our dual estimation framework, the well-known maximum a posteriori (MAP) solution 
corresponding to the marginal estimation methods based on the decoupled approach is used 
for solving the dual estimation problem [36]. In this method, the joint state/parameter marginal 
density p{x t , 9 t \yi :t ) is expressed as 

p{?u 9 t \y 1:t ) = p(x t \9 u yi:t)p(O t \y 1:t ), (8) 

where p{x t \9 t) y Vmt ) and p(O t \yv.t) denote the state and parameter marginal densities, respectively. 
Assuming that the variations of parameters are slow when compared to the system state time 
variations, one can use the approximation 9 t « 9 t _ l9 so that the joint marginal density is 
approximated as 

p(x u 0 t \yi:t) ~ p(xt\0t-u yi:t)p(Ot-i\yi:t). (9) 

Our ultimate goal is to maximize the two marginal distribution terms in expression (9) separately 
according to the decoupled approach in [36] as follows 

x t \ t = argmax^p(x t |(9t_i, y 1:t ), 9 t \ t = argmax^_ l p(6> t _i|yi :t ). (10) 
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In the above decoupled methodology, one attribute is optimized at a time by keeping the 
other attribute is fixed and then alternating them. Associated with optimization of both marginal 
distributions, different cost functions can be chosen [36]. For developing a dual extended Kalman 
filter, corresponding to specific cost functions of the parameter marginal density, various esti- 
mation methods have been proposed in the literature [36], [37]. For example, the maximum 
likelihood (ML) and prediction error approaches are selected for marginal estimations. The main 
motivation for choosing these two approaches is due to the fact that one considers to maximize 
only the marginal density p(0t-i\yi:t) as opposed to the joint density p(x t , 6 t -i\yi : t). However, in 
order to maximize the parameter marginal density, it is also necessary to generate state estimates 
that are produced by maximizing the state marginal density p(x t \0t-uyi:t)- 

It should be noted that in marginal estimation methods no explicit cost function is considered 
for maximization of the state marginal distribution, since the state estimation is only an implicit 
step in marginal approaches and the joint state/parameter cost is used that may have variety of 
forms in different filtering algorithms [36]. In our proposed dual particle filtering framework, 
p(x t \9t-i,yi:t) i s approximated by the state filtering distribution p N (x t \6 t -i, yi :t ) from equation 
(4). Next, the prediction error cost function is chosen for maximization of the parameter marginal 
density, where this cost function is implemented in a recursive manner (RPE) in order to attain 
a less computational cost [30]. 

In the subsequent subsections, specific details regarding the concurrent state and parameter 
estimation filters design and development are provided. 

C. The State Estimation Problem 

For designing the state and parameter estimation filters, our main objectives are to approxi- 
mate the integrals in equations (5) and (6) by invoking the particle filter (PF) scheme as well 
as to approximate the estimate of the conditional state and parameter distributions. Consider 
TTx^^dxt) = J Rnx 7r Xt _ llt _ 1 (dx t - 1 )K x (dx t \x t - 1 ,9 t - 1 ) to denote the a priori state estimation 
distribution before the observation at time t becomes available, and 7T6» t _ 1|t _ 1 (d9 t -i) to denote 
the marginal distribution of the parameter at time t — 1. The a posteriori state distribution after 
the observation at the instant t becomes available is obtained according to the following rule, 
7r Xtlt (dx t ) oc p^lxt^t-O^^^dxtjTr^.^^d^-i). (11) 
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In the above it is assumed that 9 t _i\ t _i is known for this filter. Therefore, the last distribution 
in the right hand side of equation (11) is set to one. 

The particle filter (PF) procedure for implementation of the state estimation and for deter- 
mining 7r Xtlt (dx t ) consists of two main steps, namely (a) the prediction step (time update step), 
and (b) the measurement update step. Consider one states in the TV particles at time t. The 
prediction step utilizes the knowledge of the previous distribution of the states as well as the 
previous parameter estimate, these are denoted by {x^_^ t _ v i = 1, TV} (corresponding to TV 
estimated state particles that follow the distribution 7v Xtl]tl (dx t -i)) and 0 t -i|t-i> respectively, as 
well as the process model given by equation (1). In other words, the prediction step is explicitly 
governed by the following equations for i = 1, TV, namely 



x 



t\t-l 



f t (xl\_ v 6 t . llt . u ujn, (12a) 
^-^^(^.p^-iit-i), (12b) 

(<) 1 N (<) (*) 1 N « T 

2=1 2=1 

where uff* denotes the process noise related to each particle x^ t _ x and is drawn from the noise 
distribution with the probability distribution function f>^(-)> and x^ t _ x denotes the independent 
samples generated from equation (12a) for i = 1, N particles. Moreover, denotes the 

(i) 

independent samples of the predicted outputs that are evaluated at samples, and 5H^ t|t _ 1 

denotes the a priori state estimation covariance matrix. 

For the first step, the one-step ahead prediction distribution known as the a priori state 
estimation distribution is now given by, 

2=1 

For the second step, the information on the present observation y t is used. This results in 
approximating 7r Xt]t (dx t ), where 9 t -i\t-i is considered to be given from a parameter estimation 
filter and obtained from the distribution irj^ ti (d9 t -i). Consequently, the particle weights w x } 
are updated by the likelihood function (the importance function) according to w x } ~ p Vt {lit — 
= p{yt\^t\ t _ v G t -i\t-i), where p Ut (.) denotes the probability distribution function of the 

(i) 

additive noise of the output and is evaluated at y t — y t ^_ v 

In this work, since our ultimate goal is in developing a fault diagnosis algorithm that is 
practically stable, the structure of regularized particle filters (RPF) is chosen that has a better 
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performance in cases that the sample impoverishment is severe, that is quite common and almost 
the case in all practical applications [38]. This characteristics of the RPFs are related to the 
fact that they are capable of transforming the discrete-time approximation of the a posteriori 
state estimation distribution tt^ (dx t ) into a continuous-time one. Consequently, the resampling 
step is modified in such a manner that the new resampled particles are obtained from an 
absolutely continuous-time distribution with TV different locations x^ t from that of x^ t _ x [39]. 
Therefore, where the probability for taking the fc-th particle is P(x[^ t = = — 

p(ytk t[t — ^ ^(k) ^ ^ _ 2 5>>>5 7V denotes the normalized particle weights. In 

T,k=i p(yt\x t \t-i> e t-i\t-i) 

other words, the particle selection in the resampling step is performed for particles that have 
higher probabilities of p Ut (y t - 

For the resampling step, two main choices can be considered that are known as (i) Bayesian 
bootstrap, and (ii) Sampling importance resampling (SIR) [4]. Although both approaches are 
applicable for this filter, the bootstrap method is chosen in this paper. Therefore, the a posteriori 
state estimation distribution is approximated by tt^ {dx t ) before one performs the resampling 
by using the RPF structure [39], and by 7r^(dx t ) after one performs the resampling that is 
provided below, 



1=1 i=l 



b 



Ei=iP(2/tFi| t -p^-i|t-i) 



1 N 1 N 

< |t (d^ t ) = - J>^)(dx t ) x t \t = ^J2 £ tl 



i=\ i=i 
where x T t eSl , 1 = 1, ..,N ieg denotes the regularized state vector that is evaluated at jV re g points 

that are obtained from the absolutely continuous-time distribution of the particles as given by 

X , - [r (1) t {n) 1 
^t\t-i — L x t|t-i' —i x t|t-iJ' 

x\ egl = min(X t i t _i) - std(X t i t _i), x] egNree = max(X t \ t -i) + std(X t i t _i), 

(15) 

d^ = (4 egjvreg -xr gi )/(iv reg -i), 

x t = x t + dx g , / = 2, A^ re g, 

where std denotes the first standard deviation of the particles from their mean. Hence, {x^}^ 
is obtained from the continuous-time distribution through the regularization kernel K that is 



June 21, 2015 



DRAFT 



13 



considered to be a symmetric density function on R nx [39]. The matrix A t in equation (14) is 
chosen to yield a unit covariance value in the new x^ t population and A t Aj = E# t|t _ x . The 
constant 6 denotes the optimal bandwidth of the kernel, and x t \t denotes the a posteriori state 
estimation at time t. 

We are now in a position to introduce our overall particle filter (PF) scheme for implementing 
the state estimation filter. Our goal for proposing this algorithm is to ensure that an approximation 

to E((/>(x t )\yi:u 0 t -i) by <j>(x t ) = x t takes x t \ t ~ ^(dx t ) = ± YaLi S^)(dx t ) 9 where ^ |t (dx t ) 
denotes the a posteriori distribution of {x^ t }f =l (after the resampling from that is 

given by x t \t = Y^Li %t\t m ^ e est i ma ted output from the state estimation filter is also given 

by vt = ^(x t | t ,^-i|t-i). 

The State Estimation Particle Filter Scheme 

1) Initialize the PF scheme with TV particles, {x^}f =1 ~ 7r Xo (dx 0 ) and the parameters 9 0 (the 
mean of the parameter initial distribution 7T0 o (d0o))- 

2) Draw cj^ ~ Pcj t (.)> where p Ut (.) denotes a given distribution for the process noise in the 
filter, and then predict the state particles according to equation (12a). 

3) Compute from equation (12b) to obtain the importance weights {wxt}^=i as w *l = 
P(yt\%t\t-v °t-i\t-i), i = 1, N, and normalize them to w% = ™ xt (i) . 

I l^i = l w x t 

4) Resampling: Draw N new particles with the replacement for each i = 1, N, according to 
P(%t\t = %t\t-i) = fc = 1, AT, from the regularized kernel /C where x^ t ~ (dx t ) 
as given by equation (14). 

5) Calculate x t \t from the conditional distribution that is given by equation (14), 

( dx t) = jf E£Li ^w(d^) with equally weighted xf t as % = ^ ^=i 

6) Update the parameters from the parameter estimation filter (to be specified in the next 
subsection). 

7) Set t := t + 1 and go to Step 2. 

Following the implementation of the above state estimation filter, the parameter estimation 
filter that is utilized for adjusting the parameters is now described in detail in the next subsection. 
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D. The Parameter Estimation Problem 

One of the main contributions of this paper is to develop a novel PF-based parameter estimation 
filter within our proposed dual state/parameter estimation framework by utilizing the recursive 
prediction error (RPE) concept. For this methodology it is assumed that the a priori distribution 
of the time-varying developed is not known. Moreover, the estimated states that are generated 
by the state estimation filter provided in the previous subsection will be used. Therefore, it is 
imperative that one considers a dynamical model associated with the parameters evolution in 
order to estimate the density function 7To tlt (d9 t ). 

The most common dynamical model that is considered for the parameter propagation (in 
case of the system with constant parameters) is the conventional artificial evolution law. In this 
representation small random disturbances are added to the state particles (parameters) between 
each consecutive time step [29]. However, in our work, the conventional update law for the 
parameters is modified to include the output prediction error as a multiplicative term to allow 
one to deal with time variations in the parameters that can affect the system output. 

In order to derive the parameter update law, an algorithm based on the recursive prediction 
error (RPE) method is proposed by minimizing the expectation of a quadratic performance index 
J(6 t -i) with respect to 0 t -\. This is due to the fact that our parameter estimation algorithm for 
obtaining the distribution of the a posteriori parameter estimate is based on the kernel smoothing 
that uses the shrinkage of the particle locations. This method attempts to force the particles 
towards their mean from the previous time step, i.e. the estimated value of 0 t _i, and is denoted 
by O t -i\t-i (before adding noise to the particles). This is also used in the state estimation filter 
for approximating x t \ t . Therefore, our goal is to investigate the convergence properties of 4-i|t-i 
whose boundedness ensures the boundedness of 9 t \ t . Towards this end, the performance index 
is now selected as E( J{O t -i)\yi:t-i, %t) = J J(0t-i)p(0t-i\yi:t-i, x t )d9 t -i, where the integral is 
approximated in the PF by E( J(0 t -i)|j/i:t-i, x t ) » £ £)JLi J(^H?i|t-i)- 

The term J(0{i\| t _ 1 ) now represents a quadratic function of the output prediction error related 
to each particle j, j = 1, N. The prediction error is now defined according to e(t, 0{l\| t _ 1 ) = 
e P = Vt — h(£t\t,d¥-i\t-i) 9 w here denotes the particle related to the estimated value 

of the parameter whose true value is denoted by 6£_ x (this is clearly assumed to be unknown). 
Therefore, we define J{0^\ t _^) = \ Y^r^t-K ^(Q( e ( r > ^-i|t-i)))' m which the expectation is 
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taken over the observation sequence of k samples. Let us now select the quadratic criterion 

Q(*M?-i|t-i)) as 

Q«t, ^1,-x)) = \<t, e^yit, e^). (16) 

The following modified artificial evolution law is now proposed for the parameter update in 
the particle filters for generating j = 1, N parameter particles that correspondingly determine 
the distribution from which the a priori parameter estimate is considered to be the same 
as 0^}^ t _ v and the a posteriori parameter estimate is obtained in two steps that are denoted by 
and 9^ 9 respectively. In the first step one gets (the second step is described on the next 
page) 

1 N 

0$ = Am? + A)m t -i + m-i = §i t%-v (17b) 

where fa = . dijt = dht ^f^-^\ w hich when evaluated at $2 Uf _, is denoted by i$\ 7t 
denotes the step size design parameter, (? ~ A/*(0, (J — A 2 )V^ J denotes the zero-mean 
normal increment particles to the parameter update law at each time step with the covariance 
matrix (I — A 2 )V^ i through the use of the kernel smoothing concept, A denotes the shrinkage 
matrix, and V§ t i denotes the covariance of the parameter estimates in the previous time 
step t — 1. The kernel shrinkage algorithm attempts to force the distribution of the parameter 
particles towards the mean of their distribution in the previous time instant that was denoted 
by fht-i, by applying the shrinkage coefficient matrix A to the obtained m[ j K The processes 
^t-i|t-i anc * Ct^ are conditionally independent given observations up to time t. Moreover, R? = 
V /trace(^ ) ^ )T ) where S? = ^^-i) " £ Zt? and eJ o (0<Vi) denotes 

the l-th element of the vector The term R? denotes a time-varying coefficient that 

determines the updating direction and is a positive scalar to ensure that the criterion (16) can be 
minimized by changing m? in the steepest descent direction. Therefore, the first step estimate of 
the a posteriori parameter estimation particle is denoted by 9^. The convergence of the update 
law (17a)-(17b) will be shown in the Subsection III-F. 

The parameter update law according to (17a)-(17b) contains a term in addition to the indepen- 

(i) 

dent normal increment Q . The estimated parameter from this update law is invoked in the PF- 
based parameter estimation filter to represent the distribution from which the parameter particle 
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population for the next time step is chosen. Therefore, the above proposed RPE-based modified 
artificial evolution law enables the PF-based estimation algorithm to handle and cope with the 
time- varying parameter scenarios. The time- varying term ^ t R t acts as an adaptive step size in 
equations (17a)-(17b), and therefore our algorithm can also be considered as an adaptive step 
size scheme. 

In order to ensure that the obtained 9^ from the modified artificial evolution law given 
by equations (17a)-(17b) remains in (refer to Assumption Al), the following projection 
algorithm is utilized that forces 9^ to remain inside D^- according to the following procedure 
[30], 

1) Choose a factor 0 < /i < 1, 



2) Compute 9% := lt R\ 3) ^ } e(t, ^ 1|t _ 1 ), 

t\t ' 



3) Construct m\ j) := 9^^ + 9 ( t j) 



4) If m[ j) £ Dj\f go to Step 6, else go to Step 5, 

5) Set 6$ = and go to Step 3, 

6) Stop. 

Consequently, the a priori distribution of the parameter 9 t is assumed to have the same 
distribution as in the previous time step. On the other hand, as the present observation y t becomes 
available in the measurement update step, the a posteriori distribution of the parameter is obtained 
through two steps that denoted by tt^ (d9 t ) and 7f^(d# t ), respectively. In what follows, more 
details related to these distributions are presented. 

Consider equations (17a)-(17b). The first step a posteriori distribution of the parameters 
calculated from the distribution of the parameter particles 9^ is given by, 

^,(d^) = ^E7=iV)W' < 18 ) 

u t\t J u t\t 

and the measurement equation is expressed as, 

y$=h t (x tlt ,6$), (19) 

where denotes the evaluated output that is obtained by the parameter estimation filter that is 
different from the one that is obtained by the state estimation filter, as provided in the Subsection 
III-C. 

Now, in the second step for estimating the a posteriori parameter estimate distribution, con- 
sider the the present observation y u so that the particle weights are updated by the likelihood 

June 21, 2015 DRAFT 



17 



function according to ~ VvXvt ~ Vt\t) = p(yt\%t\u ®t\t)- This can now ^ e ex P res sed by using 



the normalized weights wi^ as (d9 t ) = , wi j " > 5 2 ( j )(d9 t ), where wi^ = — ^N^tit) 
Following the resampling/selection step, an equally weighted particle distribution tt^ (d9 t ) is 
obtained as (d9 t ) = jr^2f =1 S^j)(d9 t ) for approximating ttq (d0 t ), and the resampled 
(selected) particles that are denoted by 0^ follow the distribution fr^ t (d9 t ). Therefore, the 
a posteriori parameter estimation distribution is approximated by a weighted sum of the Dirac- 
delta masses as fr^ (d0 t ) before one performs the resampling and with an equally weighted 
particle distribution approximation as (d9 t ) according to 

N 



-0-) a P(Vt\Zt\t,6§) 

W ft = Tr ) (20) 

N N 

< w*) = nI1 % w % = ^ E ^ > 

where denotes the normalized parameter particle weight, {0^}jli is obtained from the 
resampling/selection step of the scheme by duplicating the particles 0^ having large weights 
and discarding the ones with small values to emphasize the zones with higher a posteriori 
probabilities according to P(0^ = 0^) = w£\ k = 1, N. In our proposed filter the residual 
resampling method is used to ensure that the variance reduction among the resampled particles 
is guaranteed [40]. 

Therefore, an approximation to E(0(0 t )|yi :t , x t ) by (j){9 t ) = 9 t takes on the form 9 t \t ~ 
7r^(d0 t ) = ^2f =1 S§ti)(dO t ) 9 where 7r^(d0 t ) denotes the a posteriori distribution of the 
parameter estimate (after performing the resampling from 0^). The resulting estimated output 
of this filter is obtained by y t = h t (x t \ t) 9 t \ t ). The explicit details for implementation of the 
parameter estimation filter are now provided below. 

The Parameter Estimation Filter 

The particle filter for implementation of the parameter estimation is described as follows: 
1) Initialize the TV particles for the parameters as {0o}jLi ~ 7r<9 o (d0o), and use the initial values 
of the states as x 0 that represents the mean of the states initial distribution 7r Xo (dx 0 ). 



June 21, 2015 



DRAFT 



18 

2) Dr a w(i j) ~M(0,(I-AZ)V §t _ i]t J. 

3) Predict (9^, j = 1, iV from equations (17a)-(17b) with the projection algorithm. 

4) Compute the importance weights {w^}^ =1 , = p(y t \x t \t, ^\t)->j = 1, - iV, and normal- 
ize them to = * 

5) Resampling: Draw TV new particles with replacement for each j = 1, TV, P(0^ = 0^) = 
4f , fc = 1, TV, where 0$ ~ *£ |t (d0 t ) = £f =1 4? ^(dfc). 

6) Construct 6 t \t from the conditional distribution ir^ (d6 t ) = ^ Sjli <^O')(d0t) with equally 

weighted 0$ as 6 At = ^=Jtl 

7) Set t = t + 1 and go to Step 2 of the state estimation filter as provided in the Subsection 
III-C. 

As stated earlier, the kernel from which the parameter particles i.e. 9^ for the next time step 
is chosen is a Gaussian kernel and its mean is obtained from and its variance is obtained 
based on the kernel smoothing consideration that is provided in the next Subsection III-E. In the 
subsections below, the required conditions for boundedness of the parameter transition kernel 
K e (.) are also investigated and developed. 

E. Kernel Smoothing of the Parameters 

In this subsection, the kernel smoothing approach [20] is utilized to ensure that the variance 
of the normal distribution which is obtained according to the modified artificial evolution law 
for the parameter estimates remains bounded. 

(i) 

Consider the modified artificial evolution law (17a)-(17b) in which Q ' is a normal zero-mean 
uncorrelated random increment to the parameter that is estimated at time t — 1. If A = /, i.e. 
when there is no kernel shrinkage, as t —> oc, the variance of the added evolution increases and 
can therefore yield 9^ in (17b) completely unreliable. This phenomenon is known as the loss 
of information that can also occur between two consequent sampling times [20]. On the other 
hand, since 9 t is time-varying, generally there will not exist an optimal value for the variance 
of the evolution noise that remains suitable for all times. 

Consequently, the idea of the kernel shrinkage has been proposed in [20] and later updated 
in [21]. In the kernel shrinkage approach [29], for the next time step one takes the mean of 
the estimated parameter distribution in the particle filter according to the following normal 



June 21, 2015 



DRAFT 



19 



distribution 

K e {&6 t \6%x t ) ~ J\f(Am^ + (/ - A)m t _i, (/ - A 2 )^^), (21) 

where for j = 1, TV, is obtained from (17a). By utilizing this kernel shrinkage rule, the 
resulting normal distribution retains the mean fh t -i and has the appropriate variance for avoiding 

over-dispersion relative to the a posteriori sample. The kernel shrinkage forces the parameter 

(i) 

samples towards their mean before the noise Q is added. In our proposed approach the changes 
due to the parameter variations are considered in the mean of the parameter estimate distribution 
through the modified artificial evolution rule. Consequently, the mean of the distribution, i.e. 
m t -i, itself is time- varying and the kernel shrinkage ensures a smooth transition in the estimated 
parameters even when they are subjected to changes. To eliminate the information loss effect, 
by taking the variance from both sides of equation (17b) results in V§ i = A 2 V^ i + (J — 
A 2 )Vn = Vn . This ensures that the variance of the added random evolution would not 

7 Vt-l\t-l V t -l\t-l 

cause over-dispersion in the parameter estimation algorithm for all time. 

The following proposition specifies an upper bound on the shrinkage factor. This upper bound 
is calculated in the worst case, that is when the parameter is considered to be constant but the 
modified evolution law (17a)-(17b) is used in the parameter estimation filter for estimating it. 
Utilization of this upper bound in the kernel shrinkage algorithm ensures the boundedness of the 
variance of the estimated parameters distribution that is obtained according to the RPE-based 
artificial evolution update law and the kernel smoothing augmented with the shrinkage factor. 
Proposition 1: Upper bound on the kernel shrinkage factor. Given the parameter update law 
(17a)-(17b), the estimated parameters conditional normal distribution based on the kernel smooth- 
ing as given by equation (21), results in an upper bound for A that is obtained as A < 1(1 — 
cr mm (p v ) k ^ denotes the ib^ in equation (17a) but considered as a constant 

parameter between the time steps t and t — 1. Moreover, a min and a max denote the minimum and 
the maximum eigenvalues of a matrix, respectively, W denotes the upper-bound on the variance 
of the added noise W t9 V y denotes the upper-bound on the variance of the measurement noise 
Rt, Vq denotes the variance of the parameters when they are constant that can be assumed to be 



the same as the initial covariance of the parameters, and P max = 7oy trace (^ max ^ max T ), where 
7o denotes the initial value of the step size, and £ m ax£max T is a design parameter that denotes 
the maximum acceptable variance among the prediction error vector elements. 
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Proof: The proof of this proposition is provided in the appendix. ■ 
The convergence of the estimated parameter particles 0^}^ t _ v j = 1, TV to the local mini- 
mum of E( J(^ 1 | t _ 1 )|yi:t-i, x t ) is now investigated in the following subsection. The developed 
convergence proof does not ensure the convergence of the RPE-based parameter estimation 
method to the true parameter value, but only to a set of zeros of the gradient of the chosen 
performance index. The conditions under which the convergence of the estimated parameters to 
their optimal values can be guaranteed as TV —> oc is stated in Remark 1. 

E Convergence of the RPE-based Parameter Update Law 

In this subsection, it will be shown that the update law (17a)-(17b) can guarantee the conver- 
gence of the parameter estimate particles 6^}^ t _ v j = 1, N (after the resampling step), to a 
local minimum of E( x t ), that is located in a compact set of {x t , 0 t }, denoted 

by D^f as per Assumption Al. 

In order to investigate the convergence of our proposed RPE-based modified artificial evolution 
law for updating the parameter particles distribution and to achieve a local minimization of 
E(J(0t_i)|yi :t _i,£t), consider equation (17a), where j t denotes a time-varying step size such 
that lim^oo j t = Mo > 0, where /x 0 is a small positive constant. The introduction of the step 
size 7^ is necessary to transform the discrete-time model (17a)-(17b) into a continuous-time 
representation as shown subsequently. 

First, we need to state the following two assumptions A2-A3 according to [30], to guarantee 
the convergence of our proposed algorithm as presented in our main result below in Theorem 
1. Specifically, we have: 

Assumption A2. The function Q(e(t,0 t -i\t-i)) is sufficiently smooth and twice continuously 
differentiable w.r.t. e, and \Q €€ (e(t,6 t -i\t-i))\ < C for 6 t -i\t-i G D N , where Q €€ (e(t, <9t-i|t— i)) 
denotes the second derivative of Q(e(t,0 t -i\t-i)) w.r.t. e. 

Assumption A3. The observation sequence y t (generated from equation (2)), is such that 
E(Q(€(*,e t _i| t _i))) = M_i| t _i) and E[- r ^g(e(t^;_ 1 | t _ 1 ))] = -#(#Vi|t-i) exist for all 
0 t -i\t-i e D N , where E(Q(e(t,e t _i| t _i))) = ± EU« E Q(^ 4-i|t-i))- 

It must be noted that the kernel shrinkage method, as stated earlier, attempts to retain the mean 
of the parameter estimation particles at time t near the estimated parameter in the previous time 
step t — l 9 i.e. Therefore, in the following theorem the convergence properties of 
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is addressed. The main result of this section is stated below. 

Theorem 1. Consider the parameter estimation algorithm as specified by the equations (17a)- 
(19). Also consider the a posteriori parameter estimate as governed by equation (20). Let 
Assumptions Al to A3 hold. It now follows that the particles 9^}^ t _ v j 1, N, and con- 
sequently the distribution of the estimated parameter particles approximated by the particle filter 

< ii, M°^)> W 'P J converge either to the set D c = {^^J^V-i ^ ^^M-i|t-i) 
= 0, j — 1, N} or to the boundary of Dj^ as t —> oo. 

Proof: The proof of this theorem is provided in the appendix. ■ 
The main reason that our proposed dual state/parameter estimation method for its implemen- 
tation does not necessarily need more particles than the one that we needed for only the state 
estimation scheme, is illustrated by the result that one can extract and obtain from Theorem 
1. According to this theorem, RPE-based modified artificial evolution law enables each single 
particle to tend to D c . Therefore, even increasing the number of particles would not affect the 
convergence properties of the filter but it can certainly result in a more accurate state/parameter 
estimates. It was indicated earlier that the above theorem can only guarantee the boundedness 
of the estimated parameter distribution from the particle filters and not its convergence to the 
optimal distribution. Based on the results of Theorem 1 and Proposition 1 one can ensure that 
the probability density function and its related kernel Ko(dO t \9t-i,Xt-i) (in the particle filter) 
do remain bounded. 

Remark 1. Using the extended setting that is introduced in [41], and also by assuming that 
p(y t \x t) O t ) < oo, and K x (xt\xt-i,9t-i) < the boundedness of the parameter estimation 
transition kernel Ko(9 t \9 t -i, x t ) is also ensured from the Proposition 1 and Theorem 1. Therefore, 
the convergence of the proposed dual state/parameter estimation filter to their optimal values, 
for {x t ,9 t } £ Dm as N ^ oo can be investigated according to Theorem 3.1 that is provided in 
[41]. 

Our proposed dual state and parameter estimation scheme is an effective methodology for 
the purpose of fault diagnosis of nonlinear systems, where without loss of any generality one 
initiates operating the system from the healthy mode of operation. During the healthy operation 
of the system our proposed dual state and parameter estimation strategy can provide one with 
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an accurate and reliable information on the health parameters of the system. This information 
can then be readily used to perform the tasks of fault detection, isolation and identification, 
following the presence or injection of faults in the components of the system. In the following 
subsection, the application of our proposed approach in previous subsections for addressing the 
fault diagnosis problem of nonlinear systems is investigated. 

G. The Fault Diagnosis Formulation 

Determination and diagnosis of drifts in unmeasurable parameters of a system require an on- 
line parameter estimation scheme. In parametric modeling of a system anomaly or drift, generally 
it is assumed that the parameters are either constant or dependent on only the system states [42]. 
Hence, drifts in the parameters must be estimated through estimation techniques. 

In [43], various parameter estimation techniques such as least squares, instrumental variables 
and estimation via discrete-time models have been surveyed. The main drawbacks and limitation 
with such methods arise due to the complexity and nonlinearity of the systems that we are 
considering in this paper that render the parameter estimation here a nonlinear optimization 
problem that must be solved in real-time. In [44] a nonlinear least squares (NLS) optimization 
scheme is developed for only the fault identification of a hybrid system. 

Parameter estimation techniques that are used for fault diagnosis of system components 
generate residuals by comparing the estimated parameters that are obtained by either the ordinary 
least squares (OLS) or the recursive least squares (RLS) algorithms with parameters that are 
estimated under the initial fault free operation of the system [45]. 

The fault diagnosis problem under consideration in this work deals with obtaining an optimal 
estimate of the states as well as the time-varying parameters of a nonlinear system whose 
dynamics is governed by the discrete-time stochastic model, 

fft+i = ft(x t ,9jX(x t ),Lj t ), (22) 

y t = h t (x u 9j\(x t )) + v u (23) 

where f t : R Ux x R n ° x R Uu} — > R Ux is a known nonlinear function, 0 t e R n ° is the unknown 
and time- varying parameter vector that for a healthy system is set to 1, X t : R Ux — > R ne is a 
differentiable function that determines the relationship between the system states and the health 
parameters. The function h t : R Ux x R ne — > R Uy is a known nonlinear function, uj t and v t are 
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uncorrelated noise sequences with covariance matrices L t and V t , respectively. According to the 
formulation used in equations (22) and (23), the parameter 9 t is a multiplicative fault vector 
whose value is considered to be set equal to 1 under the healthy mode of the system operation. 

The model (22) and (23) is now used to investigate the problem of fault diagnosis (FD), 
which in this work is defined as the problem of fault detection , isolation , and identification 
(FDII) when the system health parameters are considered to be affected by an unknown and 
potentially time- varying multiplicative fault vector 9 t . The system health parameters are known 
functions of the system states, A(x t ), and the multiplicative fault vector 9 t is to be estimated. In 
other words, the a posteriori estimated parameter 9 t \t will be used to generate residual signals 
for accomplishing the fault diagnosis goal and objective. The required residuals are obtained 
as the difference between the estimated parameters under the healthy operational mode that is 
denoted by 0 O > an d the estimated parameters under the faulty operational mode of the system 
that is denoted by 9 t \ t as follows 

rt = 0o-V (24) 
It should be pointed out that the true value of the parameter is denoted by 9%, which is assumed 
to be unknown. 

For the implementation of our proposed fault diagnosis strategy that is constructed based 
on the previously developed state/parameter estimation framework, the parameter estimates will 
be considered as the main indicators for detecting, isolating, and identifying the faults in the 
system components. The residuals are generated from the parameter estimates under the healthy 
and faulty operational modes of the system according to equation (24). The estimation of the 
parameters under the healthy operational mode is determined according to, 

9 0 = argmax(-log(p(e 0 |2/i : r)) 3 ( 25 ) 

where p(9 0 \yi :T ) denotes the probability density (conditioned on the observations up to time 
T associated with the healthy data), that is obtained from the collected estimates and fitted 
to a normal distribution. The time window T is chosen according to the convergence time of 
the parameter estimation algorithm. The thresholds to indicate the confidence intervals for each 
parameter are obtained through Monte Carlo analysis that is performed under different single- 
fault and multi-fault scenarios. The estimated parameters 9 t \t are generated by following the 
procedure that was developed and proposed in previous subsections. 
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IV. Fault Diagnosis of a Gas Turbine Engine 

In this section, the utility of our proposed dual estimation framework when applied to the 
problem of fault diagnosis of a nonlinear model of a gas turbine engine is demonstrated and 
investigated. The performance of our proposed state/parameter estimation scheme will be eval- 
uated and investigated when the gas turbine is subjected to deficiencies in its health parameters 
due to injection of simultaneous faults. 

A. Model Overview 

The mathematical model of a gas turbine used in this paper is a single spool jet engine as 
developed in [34]. The four engine states are the combustion chamber pressure and tempera- 
ture, Pec and T C c, respectively, the spool speed S, and the nozzle outlet pressure Pnlt- The 



continuous-time state space model of the gas turbine is given as follows, 

Tec = —^—[(c p T c 9m c rh c + VccH u rh { - c p T C cO mT rh T ) - c v T C c(0m c m c + rh { - 0 mT m T )], 

C v TYl cc 

r)mechOm T m T c p (T C c - T T ) - 6 mc mcCp(T c - T d ) 



S = 



Mm) 2 



Pec 1 

^cc = —[(c P T c 6 mc m c + r)ccH u rhf - c p T C cO mT rh T ) - c Y T C c{0m c rn c + m f - 9 mT m T )} 

1 cc c Y m c 



+ 77 {d mc m c + m f - 9 mT m T ), 

Vcc 

Pm /3 

Pnlt = vy-(0m T rh T + — — -9 mc mc - m Nozz i e ). (26) 

For the physical significance of the model parameters and details refer to [34]. The five gas 
turbine measured outputs are considered to be the compressor temperature (yi), the combustion 
chamber pressure (y 2 ), the spool speed (y 3 ), the nozzle outlet pressure (y 4 ), and the turbine 
temperature (y 5 ), namely 

Vl = T c = T diffuser [l + - J—K-^S-) 2 ^ - 1]], 

VvcVC -Tdiffuzer 

V2 = Pec, V3 = S, y 4 = P NLT , ( 27 ) 

y 5 = Tcc[i-^T(i-(^)^]. 

^CC 

In order to discretize the above model for implementation of our proposed dual state/parameter 
estimation particle filters, a simple Euler Backward method is applied with the sampling period 

of T s 10 msec. 
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The system health parameters are represented by the compressor and the turbine efficiency, 
rjc and rj T , respectively, and the compressor and turbine mass flow capacities, mc and ra T , 
respectively. A fault vector is incorporated in the above model to manifest the effects of system 
health parameters that are denoted by 6 = [6 Vcl 9 mc , 6^ T , 9 mT ] T . By introducing a new parameter 
as 6> = the measurement equations (27) can be represented as smooth functions with 
respect to the fault parameters. Each parameter variation can be a manifestation of changes in 
the fault vector that is considered as a multiplicative fault type. All the simulations that are 
conducted in this section corresponds to the cruise flight condition mode. 

In order to demonstrate the effectiveness and capabilities of our proposed algorithms, we 
have also conducted simulation results corresponding to the well-known recursive maximum 
likelihood (RML) parameter estimation method based on PF [23], [24], [26] in order to provide 
comparative performance with our proposed RPE-based parameter estimation method. It should 
be noted that the conventional kernel smoothing method [29] is only capable of estimating the 
constant parameters, as a result it is not capable of tracking changes in the parameters after 
the occurrence of faults in the system health parameters. Therefore, the results corresponding to 
this method will not be presented in our simulations in this subsection. Moreover, the gradient 
free PF-based RML method [26] could not also yield in an acceptable performance in this 
application given the large number of tuning parameters that are associated with each parameter 
in this method. Therefore, the RML based on the direct gradient method is utilized for the 
purpose of performance comparison. 

In our schemes, the adaptive step size (P^ = 7ti?^) is defined as the product of the constant 
7 t (7 t = 0.9) with F$\ which is evaluated on-line from the trace of the prediction error covariance 
matrix that is estimated from the maximum likelihood method. On the other hand, the step 
size in the RML method was chosen as j t = 0.05 = const, that is obtained by trial and 
error. The residuals corresponding to the parameter estimates are also obtained. Based on the 
percentage of the maximum absolute error criterion, a convergence time of 2 seconds is obtained 
in simulations for estimating both the states and parameters corresponding to 25 Monte Carlo 
runs of simultaneous faults with severities ranging from 1% to 10% loss of effectiveness of the 
healthy condition magnitudes. 

To choose the number of particles for implementation of the state and parameter estimation 
filters, a quantitative study is conducted. Specifically, based on the mean absolute error (MAE%) 
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that was obtained at the steady state estimation process and by taking into account the algorithm's 
computational time, the number of particles is chosen as TV = 50 for both the state and parameter 
estimation filters in this application. Subsequently, it was confirmed that acceptable performance 
and convergence times are obtained. The shrinkage matrix is also selected as 0.937. The initial 
distributions (i.e., the mean and covariance matrices) of the states and parameters are selected 
to correspond to the cruise flight operational condition as provided in [34]. In what follows, the 
two main simulation scenarios for conducting the fault diagnosis investigation of the gas turbine 
engine are presented. 

Scenario I: Concurrent Faults in the Compressor and Turbine Health Parameters. In this sce- 
nario, the input fuel flow rate to the engine is changed by decreasing it by 2% from its nominal 
value one second after reaching the steady state condition. Next, the effects of concurrent faults 
in both the compressor and the turbine are studied by injecting sequential fault patterns affecting 
the system components. Specifically, at time t = 4 sec the compressor efficiency is reduced by 
5% (this represents the level of the fault severity), followed by at t = 9 sec the same fault type 
affecting the compressor mass flow capacity, and at t = 14 sec the same fault type affecting the 
turbine efficiency, and finally at t = 19 sec the same fault type is applied to the turbine mass 
flow capacity. 

For accomplishing the fault diagnosis task and for generating the residuals, the PF-based RML 
method with direct gradient method using two different number of particles i.e. TV = 50, and 
TV = 150, and our developed RPE-based method are considered. The results corresponding to 
changes in the fault parameters are depicted in Figure 2. The dotted lines depict the confidence 
bounds for residuals that are determined based on 50 Monte Carlo simulation runs under various 
concurrent and simultaneous single and/or multiple fault scenarios using the RPE-based method. 
By analyzing the residuals, the detection time of a fault in each component and its severity can be 
determined and identified. It follows from this figure that the constructed residuals corresponding 
to the RPE-based method and the RML method with TV = 150 almost do not exceed their 
confidence bounds subject to changes in the engine input (applied at t = 1 sec). On the other 
hand, the RML with fewer number of particles (N = 50) shows false alarms in all the residual 
signals as the fuel flow rate changes at t = 1 sec. It can be concluded that whereas the RPE-based 
method has consistent behavior in fault diagnosing the system components, the performance of 
the RML is closely dependent on the number of particles that are used. Therefore, as the number 
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Fig. 2. Residuals corresponding to the concurrent fault scenarios in the turbine and the compressor parameters. 



of particles increases for the RML method, the performance of the algorithm in fault diagnosing 
the system also improves. 

In order to obtain a quantitative measure on the precision of our proposed estimation algorithm 
the results related to the 5% fault severity in terms of the mean absolute error (MAE%) of 
estimates corresponding to the last 2 sec of simulations (following the algorithm convergence) 
after each change are provided and summarized in Table I. The state/parameter estimation MAE% 
for our proposed RPE-based method with TV = 50 and the RML method with N = 150 are 
presented. In this table, the i-th fault for i = 1, 4 denotes the last 2 sec of simulations after the 
i-th fault occurrence, and the first column refers to the healthy system before the fault occurrence. 

The results shown in Table I demonstrate that for the RPE-based method the maximum MAE% 
for the states is between 0.03% — 1.06% of their nominal values. In case of the health parameters, 
for r]c and rhc the maximum MAE% is around 0.91% and for ry T and m T it is around 0.98% 
of their nominal values. On the other hand, according to results presented in Table I (b), the 
maximum MAE% for the states corresponding to the RML method is between 0.1% — 1.16% 
of their nominal values. In case of the health parameters, the maximum MAE% ranges between 
0.8% — 2.8% of their nominal values, where both mass flow rates are estimated with higher 
MAE%. 

The MAE% for the estimated measurements (outputs) of the engine are also provided in 
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TABLE I 

State/Parameter MAE% in case of concurrent fault scenarios for (a) RPE-based method with N = 50 and 

(B) RML METHOD WITH N = 150. 



(a) (b) 



State 


No Fault 


lnd Fault 


2rd Fault 


3th Fault 


4th Fault 




State 


No Fault 


lnd Fault 


2rd Fault 


3th Fault 


4th Fault 


























^CC 


0.3529 


0.2097 


0.3614 


0.4336 


0.2374 




^CC 


0.5352 


0.6342 


0.3921 


0.8934 


0.5882 


N 


0.1473 


0.0761 


0.1087 


0.1624 


0.0296 




N 


0.0995 


0.0912 


0.1018 


0.2060 


0.1383 


T CC 


0.2683 


0.1674 


0.1678 


0.3838 


0.1155 




T CC 


0.2064 


0.2443 


0.2574 


0.5174 


0.3374 




0.8575 


0.5325 


0.3978 


1.0614 


0.3213 




P NL,T 


0.7181 


0.8112 


0.7771 


1.1603 


0.5666 




0.2702 


0.1785 


0.2749 


0.3879 


0.2107 




vc 


0.9268 


1.9195 


2.1698 


1.4508 


1.4651 


rh c 


0.6621 


0.4229 


0.3682 


0.9132 


0.2236 




rh c 


1.6338 


1.8037 


1.0761 


2.6062 


2.2717 




0.2865 


0.1648 


0.1743 


0.4885 


0.1873 




?7 T 


0.9252 


0.7876 


0.8714 


1.6517 


1.1411 


rh<-£ 


0.4744 


0.4557 


0.4889 


0.9757 


0.5037 






1.3719 


1.7858 


1.6162 


1.9666 


2.7653 



Table II for both the RPE-based and RML methods. From the results presented in Table II 
(a) one can conclude that the maximum MAE% for the temperatures (of the turbine and the 
compressor) corresponding to our proposed RPE-based method is less than 0.3%, and for the 
spool speed is less than 0.16%, and for the compressor pressure is less than 1.4%, and for the 
turbine pressure is less than 2.5%. On the other hand, the results presented in Table II (b) for 
the RML method show that the maximum MAE% for the compressor and turbine temperatures 
is less than 0.4% and 0.6%, respectively. For the spool speed the MAE% is less than 0.2%, 
and for the compressor pressure it is less than 1.5% and for the turbine pressure it is less 
than 2.5%. Consequently, the results presented in these two tables confirm that the RPE-based 
method outperforms the RML method significantly. The accuracy in the measurement estimation 
is an important aspect and factor given that from practical considerations the system states and 
parameters are unknown. Therefore, it is generally necessary to judge the estimation accuracy 
based on the output estimation error performance and behavior. 

In order to demonstrate and illustrate the precision of our proposed fault detection algorithm 
based on the dual state/parameter estimation scheme, at the end of this section a quantitative 
study is conducted by performing a confusion matrix analysis [46] in presence of various fault 
scenarios having different fault severities and in presence of the same level of process and 
measurement noise that are stated in [34] for the RPE-based method and the RML method 
implemented with N = 50 and TV = 150 particles. 

Scenarion II: Simultaneous Faults in the Compressor and the Turbine Health Parameters. In 
the second scenario, a simultaneous fault in all the 4 health parameters of the engine is applied 
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TABLE II 

Output estimation MAE% in case of concurrent fault scenarios for (a) RPE-based method with N = 50 

AND (B) RML METHOD WITH N = 150. 



(a) (b) 



Output 


No Fault 


lnd Fault 


2rd Fault 


3th Fault 


4th Fault 




Output 


No Fault 


lnd Fault 


2rd Fault 


3th Fault 


4th Fault 


























T C 


0.2893 


0.2319 


0.2749 


0.2805 


0.2357 




T C 


0.2902 


0.3240 


0.2956 


0.3985 


0.2991 


Pc 


1.3548 


1.2332 


1.2507 


1.4070 


1.1813 




Pc 


1.4012 


1.3755 


1.3030 


1.4779 


1.2902 


N 


0.1473 


0.0761 


0.1087 


0.1624 


0.0296 




N 


0.0995 


0.0912 


0.1018 


0.2060 


0.1383 


T T 


0.2034 


0.1857 


0.1911 


0.2804 


0.1322 




T T 


0.2181 


0.2461 


0.2122 


0.5786 


0.5206 


P T 


2.2231 


2.1696 


2.1839 


2.4577 


2.0783 




Pt 


2.3446 


2.3994 


2.1356 


2.5220 


2.2474 



at t = 9 sec. The compressor and the turbine efficiencies faults follow the pattern of a drift fault 
that starts at t = 9 sec and causes a 5% loss of effectiveness in the compressor efficiency by 
the end of the simulation time (i.e. at t = 19 sec), and a 3% loss of effectiveness in the turbine 
efficiency by the end of simulation time. Simultaneously, the mass flow rate capacities of both 
the compressor and the turbine are affected by a fault that causes a 5% loss of effectiveness. 

The residuals corresponding to the two previous estimation methods are provided in Figure 3. 
The simulations show that in case of changes in the engine input (applied at t = 1 sec) all the 
four RML method residuals with N = 50 particles exceed the thresholds. However, the RML 
method with even three times more particles as compared to our proposed RPE method has high 
false alarm rates similar to the first scenario for the concurrent faults. More quantitative analysis 
on the performance of the RML method that is compared to the RPE-based method is provided 
in the subsequent subsection. 

The results in Table III (a) show that for our proposed RPE-based method, the maximum 
MAE% for both state and parameter estimates are between 0.1% — 0.5% of their nominal values. 
However, in the worst case the post fault estimated MAE% of the m T is 0.47% of its nominal 
value. Moreover, in the results shown for the RML method in Table III (b) it follows clearly 
that the state estimation MAE% can be achieved within 0.1% — 0.8% of the nominal values, 
whereas the parameter estimation MAE% is achieved within 0.7% — 3% of the nominal values 
with higher error rates after the fault occurrence, specially in the compressor and turbine mass 
flow rate capacities. 

The MAE% measurement (output) estimate error given in Table IV (a) for the RPE-based 
method shows that after simultaneous fault occurrences the error increases when compared 
to their values before the fault occurrences. This is caused due to accumulation of parameter 
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Fig. 3. Residuals corresponding to the simultaneous fault scenarios. 



estimation errors while all the four parameters are affected by a fault. On the other hand, the 
results corresponding to the output estimation as given in Tables IV(a)-(b) show that with the 
exception of the turbine pressure, our RPE-based method outperforms the RML method for 
estimating the other four measurement outputs. 

To summarize, our proposed RPE-based fault diagnosis algorithm is capable of detecting, 
isolating and estimating the component faults of a gas turbine engine with an average accuracy 
of 0.3% for the compressor and 0.5% for the turbine faults. In contrast the RML algorithm is 
capable of achieving the performance of an average 3% for the compressor and 1.6% for the 
turbine faults. 

Fault Diagnosis Confusion Matrix Analysis 

Finally, in this subsection a quantitative study is performed by utilizing the confusion matrix 
analysis [46] to evaluate the increase in the false alarms and/or misclassification rates of the 
faults in our considered application when the fault diagnosis algorithm is implemented by the 
RML with N = 50, the RML with N = 150, and the RPE-based method with N = 50 
particles. The thresholds corresponding to each algorithm are determined from 25 Monte Carlo 
runs on simultaneous fault scenarios that are not necessarily the same for the three algorithms. 
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TABLE III 

State/Parameter MAE% in case of simultaneous fault scenarios for (a) RPE-based method with N = 50 

AND (B) RML METHOD WITH N = 150. 





fa) 






(b) 




State 


Before Fault 


After Fault 




State 


Before Fault 


After Fault 














^CC 


0.2217 


0.2372 




^CC 


0.4865 


0.4383 


N 


0.0535 


0.1061 




N 


0.1025 


0.1053 


T CC 


0.2086 


0.1928 




T CC 


0.2247 


0.1888 




0.3970 


0.4291 






0.7540 


0.5264 


vc 


0.1735 


0.1821 




vc 


0.9295 


1.7956 


rh c 


0.2811 


0.3293 




rh c 


1.7291 


2.9729 




0.1016 


0.1485 




T) T 


0.7306 


1.0040 




0.4589 


0.4744 






1.4290 


1.5923 



TABLE IV 

Output estimation MAE% in case of simultaneous fault scenarios for (a) RPE-based method with N = 50 

AND (B) RML METHOD WITH N = 150. 

(a) (b) 



Output 


Before Fault 


After Fault 




Output 


Before Fault 


After Fault 


T C 


0.2207 


0.2852 




T C 


0.2482 


0.2832 




1.2926 


1.3729 






1.4051 


1.4369 


N 


0.0535 


0.1027 




N 


0.1025 


0.1053 


T T 


0.1565 


0.1650 




T T 


0.2058 


0.1774 


P T 


2.1210 


2.2348 




P T 


2.1413 


2.1580 



The confusion matrix data is obtained by performing simulations for another 35 Monte Carlo 
simultaneous fault scenarios having different fault severities and in presence of the same process 
and measurement noise covariances corresponding to 50% of the nominal values of the process 
and measurement noise covariances (according to [34]). In these scenarios, at each time more 
than one of the system health parameters are affected by component faults. 

The results are shown in Tables V(a)-V(c) corresponding to the RML method with N = 50, 
the RML method with N = 150, and the RPE-based method with N = 50 particles, respectively. 
In these tables the rows depict the actual number of fault categories that are applied and the 
columns represent the number of estimated fault categories. The diagonal elements represent the 
true positive rate (TP) for each fault occurrence. The accuracy (AC), precision (P), and the 
false positive rate (PP) of the three algorithms are also evaluated from the confusion matrix 
results according to the following formulae [46], 

AC= p= ] " JJ , f>. = PP=S^, 
V V c- V c- V ck- 

where c^-, z, j = 1, 5 denote the elements of the confusion matrix. In Table VI, the confusion 
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TABLE V 

Confusion matrix for (a) RML method with N = 50, (b) RML method with N = 150, and (c) RPE-based 

METHOD WITH N = 50. 

(a) (b) 
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(c) 
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matrix results according to the above metrics for the Tables V(a)-V(c) are provided. The results 
demonstrate that as the number of particles in the RML method increases from 50 to 150, 
the accuracy of the fault diagnosis algorithm does increase by 20%, and the false positive 
rate does decrease by 20%. The precision of the algorithm also increases by 20% for almost 
all the four health parameters of the system. However, the RPE-based method with N = 50 
outperforms the RML method significantly in terms of higher accuracy, lower false positive rate, 
and higher precision for all the four health parameters of the gas turbine engine. 

TABLE VI 

Confusion matrix Analysis results. 



Noise Level 


AC% 


FP% 


Pr ?c % 


Prh c % 


Pr] T % 


Prh T % 


RML Method with 50 Particles 


58.86 


31.43 


67.74 


51.11 


53.85 


56.0 


RML Method with 150 Particles 


78.86 


11.43 


87.50 


72.97 


74.29 


72.22 


RPE-based Method with 50 Particles 


86.29 


5.71 


93.94 


93.75 


77.78 


74.36 



V. Conclusion 

In this paper, a novel dual estimation filtering scheme is proposed and developed based on 
particle filters (PF) to estimate a nonlinear stochastic system states and time variations in its 
parameters. The dual structure is based on the extension of the Bayesian parameter estimation 
framework. A dual structure is proposed for achieving simultaneous state and parameter estima- 
tion objectives. Performance results of the application of our method to a gas turbine engine under 
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healthy and faulty scenarios are provided to demonstrate and illustrate the superior capability 
and performance of our scheme for a challenging fault diagnostic application as compared to the 
well-known recursive maximum likelihood (RML) method based on particle filters for parameter 
estimation objective. Whereas the computational complexity of both algorithms remains the same 
at 0(N reg x TV), our proposed dual estimation method requires much less number of particles 
for its implementation as compared to the recursive maximum likelihood method. On the other 
hand, the false alarm rates of our proposed dual algorithm is significantly lower than the RML 
method. These two main characteristics justify and substantiate the observation that our proposed 
algorithm is more suitable for the purpose of fault diagnosis of critical nonlinear systems that 
require lower fault detection times and false alarm rates. Moreover, the estimation results accuracy 
in terms of the fault identification are also provided. The obtained results are demonstrated and 
validated by performing a confusion matrix analysis. 
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APPENDIX 

Proof of Proposition 1: Let us consider the modified artificial evolution law by assuming 
that A = / in equation (17b). Let us substitute from equation (17a) where the superscript 
(j) is omitted in order to define the modified artificial evolution law in a more general form that 
is also applicable to each single particle as 

0 t \t = O t -i\t-i + Ptfaefa e t -i\t-i) + Ct, (28) 

where P t = jtRt- Now, let V(.\yi :t ) denote the variance of the stochastic process assuming that 
the observations up to time t are available, and C(.,.\yi :t ) denotes the covariance of the two 
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stochastic processes by assuming that the observations up to time t are available. By taking into 
account the relationship between the variance of both sides of equation (28) when the covariance 
matrix is assumed to be non-singular and when the prediction error at time t is uncorrected 
with the parameter estimate at time i, given that 0 t -i|t-i is independent of P t ^e t , therefore we 

get V(6 tlt \y 1:t ) = V((9 t -i|t-i|j/i:t) + P t 2 *V y * T + W t + 2C(0 t _ 1 | t _ 1 ^|y 1:t ) + 2C(P t tte t ^|j, 1:t ). 
Furthermore, since P t = ^ t R t = j t ^/tmce(S^S^) is a scalar, one can write V(P t ^e t \yi :t ) = 
mPt\yi*]) 2 V(*e t \ yi *) + (E[*e t \ yi *]) 2 V(P t \y 1:t ) + V(P t \y 1:t )V^e t \y 1:t ) = P t 2 W t tt T . 

In order to ensure that there is no information loss (particularly in the case that 9 t is constant), 
one must have, V(6 t \ t \y 1:t ) = V (9 t -i\t-i\yi:t) = v e t _ 1]t _^ which implies that, C(§t-iXt\Y t ) + 
C(P t ^e t Xt\Yt) = —\Wt~ \P^V t ^ T . Therefore, negative correlations are needed to remove 
the effects of unwanted information loss. In case of approximate joint normality of the stochastic 
process (O t -\\t-i,Ct\Yt) and (P t ^€ t , (t\Y t ), the conditional normal evolution is obtained as 

p{0 t \t\0 t -i\t-i) ~ M{O t \t\AA\t + (I ~ A t )0 t -i| t -i, (/ - A 2 t )V §t _ iit J, (29) 

where the mean of this Gaussian distribution at each time step is found from equation (17a), 
when 9 t -i\t-i is substituted by its modified version according to the shrinkage kernel. The 
shrinkage matrix A u is obtained as A t = I - [l(W t V^ + P?WV y W T V^ )]. Assuming 
that in the kernel shrinkage method, the variance of the evolution noise is interpreted as W t = 
(I — At)V§ t i? therefore by replacing W t in the equation that was obtained for A t results in 

A t = I-\[{I- A 2 t ) + P?W* T V e -^ t J. (30) 

Let us assume that our main goal is to obtain and determine the shrinkage matrix A t as 
A = a/, therefore the matrix equation (30) can be written as 

( a 2 _ 2a + 1)1 - PfW^Vg-^ = 0. (31) 

We are interested in obtaining an upper bound for the shrinkage matrix that can be used for 
all time. Assuming that the last term in the right hand side of equation (30) has an upper bound 
given by {P^V^Vf^J < P^V y V T Vf\ where P max = 7 o0race(£: max £ tj with 7o 
denoting the initial step size, therefore £ max is considered as the maximum acceptable variance 
of the prediction error, V y is an upper bound of the measurement noise covariance, and V§ is 
the minimum bound of the parameter estimation covariance that is considered to be similar to 
the initial covariance of the parameters (before adding the evolution noise in time). Therefore, a 
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bound on al and consequently A can be obtained as A = al < I — \ amm ^ ax ^^ ^ j w h ere 
the normalization of the e igenvalue is perform ed to ensure that the associated fraction remains 
less than 1. Let a = 1 — \ ^ mm ^ ax ^^ T ^_^ ? therefore, the shrinkage matrix becomes A = al. 
The smoothing matrix corresponding to the normal distribution variance is now obtained from the 
shrinkage factor as (1— a 2 )I. This guarantees that the distribution (29) has a finite variance as t —> 
oc for both constant and time- varying parameter estimation cases. This completes the proof of the 
proposition. ■ 
Proof of Theorem 1: The existence of the projection algorithm in the parameter estimation 
scheme ensures that 9^ remains inside Djsf. According to equation (20), the a posteriori 
estimate of the parameter at time t is obtained from the resampled particles of the parameter 
estimate 9^ 9 as 9 t \ t = YtjLi ®t\h where 9^ is selected from the TV particles of 9^ for which 



PvXvt — h(x t \t, ®t\t)) yields higher probabilities. In order to avoid the discontinuity that is caused 



by resampling, in this procedure only the particles that are maintained from time i.e. 9^ 



and are propagated to time t as 0^, are considered. However, the rest of the particles that are 
to be discarded in the resampling process will be replaced by the kept particles. Therefore, the 

results can be generalized to all the particles. 

50') 



Consider equation (I7b) for generating 9^J and let us substitute m\ ] from the RPE-based 

t\t ' 



update rule of (I7a)-(l7b) to obtain the following expression for the resampled particles 9^ 



namely 



i N 

= A ^t%-i + V- A )ij2 § t-Mt-i + A ^R{ 3) 4 3) ^ + VT^T^\ (32) 



N 1 

1=1 



where y/I — A 2 Q J) denotes the evolution noise by taking into acount the kernel smoothing 
concept. By applying the sum operator to both sides of equation (32) to construct 9 t \ t yields, 

^ N ^ N N N N N 

77 Y Ot\t = A ^ Ot-i\t-i + 77 77 Y dt-i\t-i ~ A ^ Y 77 Y K 3 -\\t-\ 

j — 1 j = l i—\ i—\ i—1 i—1 

N N N N 

j=l j=l j=l j=l 

which results in 9 t \ t = 9 t -i\t-i + A jj Y^f=i 7t^?V? )e (*? ^-i|t-i)- Assumptions Al and A2 
ensure that the regularity conditions are satisfied according to [30]. Consequently, a differential 
equation associated with (17a)-(17b) can be obtained by considering that At is a sufficiently 
small number and t, i are specified such that Ylk=t 7& = Through a change of time-scales 
as t —> r and i —> r + At, for a sufficiently small At, and by assuming that 9 t -i\t-i = 0, = 
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, A = al is a constant matrix, the difference equation for 9^ is now expressed as 

§U) ~ 0ii) + fl A#'(/(0 w ), (33) 

where g(9^) = ^ J^jjz* ip^e(k, 9[ j } 1 ^_ 1 ). In the above derivation it is assumed that the 6^ 
particle is kept after resampling (that is —> 0$ ~~ ^ @t\t)- Consequently, considering 

Assumption A3, the differential equation associated with the evolution of each single particle is 
obtained as, 

^ = aR<£\T)g{§<g>(T)) = -oflg>(r)[^L J((9g>)] T , (34) 

where the subscript D is used to differentiate the solution of the differential equation (34) from 
the solution of the difference equation (33). Now, the required convergence analysis is reduced 
to investigating the properties of the deterministic continuous-time system (34). 

Consider the positive definite function L(^ ) 1|t _ 1 ) = E(J(^ t ( i ) 1|t _ 1 )) = ^EjiM-V-i) 
that represents the expectation of a positive definite function through TV data points for 9^}^ t _ v 
Our goal is to evaluate the derivative of this function along the trajectories of the system (34). 
According to Assumption A2, the second derivative of Q(e(t,9 ( t j } 1 ^ t _ 1 )) is bounded, therefore 
the summation and derivative operations commute. According to Assumption A3 for 9^}^^ e 

D M , J(6$(t)) = ^r— M-Vi)U> =*« = ^d^WA-V-i))' exists and is 

uc t-l|t-l £-l|£-l t-i|t-i 

approximated by -g(9 {j) ). Therefore, let us define V{9^) = E(J(0%\t)) 9 and given that a > 0, 
and Rp\r) is a positive scalar for j = 1, TV (which represents the trace of a positive definite 
matrix at time r), one gets 

where the equality is obtained only for 9 D {r) e A:- Therefore, as i — >► oc either 9^2i\ t -i and con- 
sequently, 7r^_ 1|t _ 1 w.p.l tends to Z?c or to the boundary of Dyv, where w.p.l is with respect to the 
random variables related to the parameter estimate particles. It should be noted that for particles 
that have been replaced in the resampling this equality is valid since they are replaced by particles 
that have satisfied (35). This completes the proof of the theorem. 



June 21, 2015 



DRAFT 



